Word Frequency


Method

This Plot looks only at the most frequent words use over the entire corpus of tweets

Word Frequency Per Media House


Method

This Plot looks only at the most frequent words use over the entire corpus of tweets and grouped by Twitter handle

Bi-Gram Frequency


Method

The words are grouped by their adjacent words forming two word groups and then these groups gets counted.

Optimal K For Topic Model


Method

Through the use of the “ldatuning” package, it realizes 4 metrics: “Griffiths2004”, “CaoJuan2009”, “Arun2010”, “Deveaud2014” to select the perfect number of topics for a LDA model. The total number of CPU cores can be indicated for optimal performance when executing this method. The larger the dataset, the longer it takes to calculate the results. For more information on this method and the various metrics to obtain the optimal K topics, visit: https://cran.r-project.org/web/packages/ldatuning/vignettes/topics.html or https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/

References

  1. Rajkumar Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On finding the natural number of topics with latent dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining, Mohammed J. Zaki, Jeffrey Xu Yu, Balaraman Ravindran and Vikram Pudi (eds.). Springer Berlin Heidelberg, 391–402. http://doi.org/10.1007/978-3-642-13657-3_43

  2. Cao Juan, Xia Tian, Li Jintao, Zhang Yongdong, and Tang Sheng. 2009. A density-based method for adaptive lda model selection. Neurocomputing — 16th European Symposium on Artificial Neural Networks 2008 72, 7–9: 1775–1781. http://doi.org/10.1016/j.neucom.2008.06.011

  3. Romain Deveaud, Éric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17, 1: 61–84. http://doi.org/10.3166/dn.17.1.61-84

  4. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101, suppl 1: 5228–5235. http://doi.org/10.1073/pnas.0307752101

  5. Martin Ponweiser. 2012. Latent dirichlet allocation in r. Retrieved from http://epub.wu.ac.at/id/eprint/3558

Topic Model For All Tweets


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 8 based on the method used for identifying the optimal K value. This model was used with a “beta” matrix in order to examine per-topic-per-word probabilities.

Topic Model for Media 24


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Topic Model For EWNupdates


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Topic Model for eNCA


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities..

Topic Model For SABC News


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 6. SABC News had the most overall Tweets and a topic model with the only to have 6 topics as the other media houses yielded mixed results. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Topic Model for GovernmenrZA


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.